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ABSTRACT 


Efficient  algorithms  for  11  and  13-point  DFT's  are  presented.  A more 
efficient  algorithm,  compared  to  earlier  published  versions,  for  the  computa- 
tion of  9-point  DFT  is  also  included.  The  effect  of  arithmetic  roundoff  in 
implementing  the  prime  factor  and  the  nested  algorithms  for  computing  DFT  with 
fixed  point  arithmetic  is  analyzed  using  a statistical  model.  Various  aspects 
of  the  prime  fac^  **,  the  nested  and  the  radix-2  FFT  algorithms  are  compared. 

A processor-based  hardware  implementation  of  the  prime  factor  algorithm  is 
discussed. 
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CHAPTER  I 
INTRODUCTION 

The  discrete  Fourier  transform  (DFT)  of  a sequence  x^,  k = 0,  1, 

N - 1,  is  defined  as  [1] 

N- 1 

Xn  =]C  XkWNk  * n = 0,  1,  ...,  N-l  (1-1) 

k = 0 

where  = exp  (-j  2tt/N)  . It  is  an  invertible  transformation  and  the  sequence 
x^  can  be  recovered  using  the  following  inverse  DFT  (IDFT)  relation: 

N - 1 

\ = XnWNnl<  ’ k = 0,  1 N-l  (1-2) 

n = 0 

Many  digital  signal  processing  systems  employ  the  DFT  for  a variety  of 
applications.  The  design  and  implementation  of  digital  filters,  spectral  ana- 
lysis of  signals,  and  detection  of  targets  from  radar  echoes  are  a few  examples. 

As  a result,  there  is  a continued  interest  in  finding  increasingly  sophisticated 
algorithms  for  computing  the  DFT. 

2 

The  direct  evaluation  of  Xn  in  Eqation  (1  • 1)  requires  about  N multi- 
plications and  additions  (hereafter  these  two  arithmetic  operations  are  referred 
to  simply  as  "operations").  If  N is  a composite  number,  however,  the  compu- 
tational burden  can  be  reduced  by  employing  one  of  the  three  following  algorithms: 

(a)  Fast  Fourier  transform  (FFT)  algorithm 

(b)  Prime  factor  algorithm  (PFA) 

(c)  Nested  or  Winograd  Fourier  transform  algorithm  (WFTA) 
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The  factors  of  N have  to  be  relatively  prime  for  the  PFA  and  the  WRA, 
while  no  such  constraint  is  imposed  in  the  case  of  the  FFT.  These  algo- 
rithms are  briefly  discussed  in  Chapter  II. 

Efficient  algorithms  to  find  the  DFT  of  short  sequences  are  needed  to 
obtain  the  transform  of  longer  sequences  by  the  PFA  and  the  WFTA.  An 
efficient  procedure  for  deriving  the  Short  length  DR  algorithms,  when  the 
transform  length  is  a prime  or  a prime  power,  is  discussed  in  [2].  Algo- 
rithms for  several  short  length  transforms  derived  using  this  method,  are 
given  in  [3]  and  [4].  In  Chapter  III  this  procedure  is  briefly  explained, 
and  efficient  algorithms  for  9,  11  and  13-point  DFT's  are  presented.  The 
short  length  DFT  algorithms  given  in  [3]  are  also  listed. 

The  DFT  algorithms  can  be  either  programmed  on  general  purpose  digital 
computers  or  implemented  directly  by  special  purpose  hardware.  In  either 
case,  finite  word  length  arithmetic  is  used.  This  introduces  error  in  the 
computation  of  the  DFT.  It  is  difficult  to  evaluate  precisely  the  magnitude 
of  this  error.  However,  by  making  certain  assumptions  about  the  nature  of 
errors  introduced,  a simple  statistical  model  can  be  developed,  and  error 
bounds  can  be  obtained.  The  derivation  of  such  bounds  in  the  case  of  the  FFT 
has  been  studied  extensively  and  is  discussed  in  [1]  and  [5].  Error  bounds 
for  the  prime  factor  and  the  nested  methods  of  computing  the  DFT,  when  fixed- 
point  arithmetic  is  employed,  are  derived  in  Chapter  IV. 

Various  aspects  of  the  FFT,  the  PFA  and  the  WFTA  are  compared  in  Chapter 
V.  A processor-based  hardware  implementation  of  the  PFA  is  discussed  in 
Chapter  VI.  The  results  are  summarized  in  Chapter  VII. 
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CHAPTER  II 

DFT  ALGORITHMS  FOR  COMPOSITE  N 


1.  Fast  Fourier  Transform  Algorithm  [6] 


Let  N = r^  • r2 


Expressing  the  indices  n and  k in  Eq.  (1-1)  as 


n = n1r1  + , n,  = 0,1 . . ,r9-l 


n 2 ~ 0, 1,..., r i — 1 


k = k^r2  + k2  , k^  0,1,...,r-j  - 1 


k2  ~ 0,1 . . ,r2  ~ 1 


and  representing  the  sequences  Xn  and  as  two  dimensional 
(!• 1 ) can  be  rewritten  as 


rz-l  r,  -1 


nk„  nk,r. 


V"' 

X(nl  ’n2  = WN  WN  x( 


k2  = 0 k1  = 0 


Since 


nk,r,  k,n9 

u 1 ^ _ u 1 t 

WN  Wr1 


Eq.  (2*2)  can  be  simplified  to  yield 


r2 ' 1 

„ , . V u1"1’'1  * "2)k2 

X(nj  ,n2)  - 


k2  = 0 


A(n2,k2) 


(2.1) 


arrays.  Eq. 


(2-2) 


(2*3) 


(2-4) 
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where 

ri-'  kn 

A(n2,k2)  = £ Wr'  2 x<k,  .k2)  (2-5) 

k1  » 0 1 

2 

The  A array  can  be  obtained  in  operations  and  from  this,  the  sequence 

2 — 

Xn  can  be  calculated  in  r^  operations.  Therefore,  the  N-point  DR  can 

be  computed  in  N(r^  + r^)  operations.  In  general,  if  N = r^  x r^  x ...  x rL  , 

this  method  requires  N(r^  + r2  + ...  + r^)  operations  to  compute  the  N-point 

DFT.  A flow  graph  for  a 6-point  DFT  computation  by  this  method  is  shown  in 

Fig.  1.  The  terms  of  the  form  (i  = 1,2,3)  in  Fig.  1 are  called 

"twiddle  factors." 


2.  Prime  Factor  Algorithm  [7] 

Let  N = r^  • r2  and  assume  that  r^  and  r 2 are  mutually  prime  (that 
is,  the  greatest  common  divisor  of  r^  and  r 2>  denoted  by  GCD(r^,r2),  is 
equal  to  1 ) . 

Then,  the  indices  n and  k of  Eq.  (1*1)  can  be  expressed  as 


k = k^r]  + k2r2  (mod  N),  ^ = 0,1, ...,r2  - 1 

k2  = 0,1 , . . . , r^  - 1 


(2*6) 


n=  r2  s i + n^  s2(mod  N) , 
where 

n^  = n(mod  r-j ) , 
n2  = n (mod  r2 ) , 


n « 0,1 , . . . ,N-1  (2*7) 


n1  = 0,1 rl  " 1 

(2-8) 

n2  — 0,1 ... . ,r2  - 1 
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and  s-j.s,,  are  solutions  of 


5^2  5 1 (mod  r-j ) 
s2r1  = 1 (mod  r2) 


(2*9) 


Again,  representing  the  sequences  and  xn  as  two  dimensional  arrays,  Eq. 
(1-1)  can  be  rewritten  as 


r,-l  r2-1 


x'"l  ■n2)  * £ S 


k2  = 9 k-j  = 0 


W. 


( k i r i + ^ 2 r2  ^ ^ n 1 r2s 1 + n2r 1 S2 ^ 


x ( k-j  ,k2) 


(2-10) 


Since 


,rlr2klnlSl 


wrlr2k2S2  . 1 

WN  1 


(2*11) 


Eq.  (2*10)  can  be  simplified  to  get 


V1 


n,k. 


x(nrn2)  Wr]  2 A(n2,k2) 

k2=o  1 


(2*12) 


where 


r?-  1 

i-.  k,n 

Mn2>k2)  = V Wr'  £ x(krk2) 
k]  = 0 2 


(2*13) 


That  is,  the  N-point  DFT  can  be  viewed  as  r^  r2-point  DFT’s  followed  by 
r2  r^-point  DFT's.  If  the  r-j  and  r2-potnt  DFT  computations  require  and  M2 


M,  M 

operations  respectively,  the  N-poir.t  DFT  can  be  obtained  in  N(—  + — ) 

rl  r2 

operations. 

The  essential  difference  between  the  PFA  and  the  FFT  are  the  following: 

a.  The  factors  of  N must  be  mutually  prime  in  the  PFA 

b.  There  are  no  twiddle  factors  in  the  PFA 

c.  The  index  mapping  which  converts  the  one-dimensional  arrays  in  Eq.  (1*1) 
into  two-dimensional  {in  general,  multidimensional)  arrays  is  different 
for  the  two  cases. 

Fig.  2 depicts  the  computation  of  a 6-point  DFT  using  the  PFA. 


3.  Nested  Algorithm  [3] 

The  DFT  relation  in  Eq.  (1*1)  can  oe  represented  in  the  matrix  form 


X = Wx 


where 


X = 

• X X * 

- ' o 

J 

» 

x = 

i 

o «— 

1 x X 

>1 

XN  - 1 

(2-14) 


and  W is  the  N-point  DFT  matrix,  the  (i,j)^  element  of  which  is  equal  to 
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GCD  (rr  r2)  = 1 

= r*i  — point  DFT  matrix 
W2  = r2  ' P01 * * * S'nt  &FT  matrix 

Then,  Eq.  (2-14)  can  be  rewritten  as 


* = Pout 


(Wi 


w2) 


where  P . and  P.  are  two  NxN  permutation  matrices,  and 
out  in 

"Kronecker  product"  operation. 

' -1  ' 

Furthermore,  let  X = Pout  X and  x = Pin  x . 

I l 

Then  Eq.  (2  15),  in  terms  of  X^  and  x , becomes 


X*  = (W1  * W2)  . x* 


It  is  possible  to  decompose  W as 

W = SCT 


where 

T - H x N incidence  matrix  (that  is,  a 
its  elements  taking  values  -1,  0 

C = M x M diagonal  matrix 

S = N x M incidence  matrix 


(2-15) 

* denotes  the 

(2-16) 

(2-17) 

(2-18) 

matrix  which  has 
or  1 only) 
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It  is  easy  to  see  that  multiplication  of  a column  vector  by  S or  T 
requires  only  additions,  whereas  M multiplications  are  required  to  obtain 
the  product  of  an  arbitrary  vector  and  C.  Therefore,  M multiplications  are 
sufficient  to  evaluate  an  N-point  DFT.  It  should  be  noted  that  for  small 
values  of  N,  M is  approximately  equal  to  N. 

Decomposing  and  in  a similar  manner,  Eq.  (2-1/)  can  be  rewritten  as 

x'  = (S-j C-j T-j  * S2C2T2)  x'  (2*19) 

where 

W.  = S.C.T.  , i = 1,2  (2-20) 

Since 

AB  * CD  = (A  * C)  (B  * D)  (2-21) 


Eq.  (2  19)  becomes 


X = (S1  * S2)  • (C,  * C2)  • (T]  * T2)  • x 


(2-22) 


If  the  r^-point  DFT  requires  multiplications  and  A1  additions,  and  the 
r2*point  DFT  requires  M?  multiplications  and  A2  additions,  the  N-point  DFT 
can  be  obtained  in  M^M0  multiplications  and  A^N2  + additions.  Since 

the  Kronecker  product  operat.un  is  not  commutative,  the  number  of  additions 
required  depends  on  the  order  in  which  r^  and  r2  are  chosen.  Therefore, 
the  order  that  minimizes  the  number  of  additions  should  be  used. 

In  general,  if  N has  r^,r2,,..,r^  as  factors,  which  are  relatively 
prime,  the  N-point  DFT  matrix  can  be  represented  as  in  Eq.  ( 2 * 13) 


S = S * S * 
1 z 

c = C]  * c2  * 


T = T*T*  * T 

i i ^ 1 2 • • • 1 
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If  = number  of  additions  required  for  an  r^  -point  DFT,  i = 1,2,...,L 


M.j  = number  of  multiplications  required  for  an  r..  - point  OFT,  i = 1,2,...,L 


Using  this  algorithm,  the  N-point  DFT  can  be  obtained  in 
and 


multiplications 


N + N 


M 

N 


k 

k 


additions. 


n 


CHAPTER  III 

SHORT  LENGTH  DFT  ALGORITHMS 

The  central  idea  behind  deriving  efficient  algorithms  for  computing  the 
DFT  of  short  length  sequences  is  to  reduce  the  problem  to  one  of  evaluating 
cyclic  convolutions.  It  was  shown  by  Rader  [8]  that  if  N is  a prime,  the 
N-point  DFT  can  be  viewed  as  an  (N-l)-pcint  circular  convolution.  When  N 
is  a prime  power  (i.e.,  N = pr,  p/2),  the  N-point  DFT  can  be  obtained  by 
computing  a (P-1 ) * p point  convolution  and  two  p point  convolutions  [9]. 

Efficient  algorithms  exist  for  performing  circular  convolution.  These  can 
in  turn  be  employed  to  compute  the  DFT,  after  replacing  it  with  a convolution 
problem.  In  order  to  derive  these  algorithms  it  is  best  to  employ  polynomial 
multiplication  techniques.  To  this  end,  consider  the  problem  of  obtaining 
the  circular  convolution  of  two  sequences  c^,  n = 0,1,..., N-l  and  bn, 
n = 0,1 , . . . ,N-1 , defined  by 

N-l 

cn  = S akbn-k  ’ n 3 0,1 , . . . ,N-1  (3-1) 

k=0 

where  the  indices  are  evaluated  modulo  N.  Eq.  (3-1)  can  be  viewed  as  a poly- 
nomial multiplication  problem.  That  is,  if 

N-l 

p(x)  = ^ a.x1 
i=0 


N-l 

q(x)  = ^ bix1 

i=0 


12 


and 


then 

Y ( x ) = p(xj  • q(x)  mod  (xN  - 1)  (3-2) 

For  small  values  of  N,  the  coefficients  of  the  polynomial  y(x)  can  be  com- 
puted efficiently  as  explained  below: 

k 

Let  T(x)  = xN  - 1 = [j  T^(x)  be  the  decomposition  of  T(x)  into  its 

i=l 

irreducibles.  By  the  Chinese  Remainder  Theorem,  the  coefficients  of  r(x) 
can  be  obtained  from  those  of  r^(x)  = p(xj  • q(x)  mod  T^x),  i = 1,2,...  ,k. 

It  is  shown  in  [10]  that  the  minimum  number  of  multiplications  needed  to 
compute  the  coefficients  of  r(x)  in  this  way  is  2N-k.  When  N is  small, 
it  is  possible  to  achieve  this  minimum,  but  for  large  values  of  N , the 
number  of  additions  required  will  be  very  large  and  this  method  becomes 
inefficient. 

Algorithms  for  finding  the  OFT  of  short  sequences  have  been  derived  and 
are  given  in  [3]  and  [4].:  Efficient  algorithms  for  11  and  13-point  DFT's 
are  presented  here.  A more  efficient  algorithm  for  9-point  OFT  than  those 
in  [3]  and  [4]  is  also  presented.  Other  known  short  transforms  [3]  are  also 
listed. 
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ALGORITHMS 

In  the  algorithms  given  below,  x(i),  i=0,  1,  ....  N-1,  represents  input 
data  for  N-point  OFT,  X(i),  i =0,  1,  ....  N-1,  represents  N-point  DFT  output, 
and  a-j , m^ , Cy  etc.,  are  variables  used  for  storing  intermediate  results. 

2  Point  DFT  Algorithm 
ITl^  = x(0)  + x(l) 
m2  = x ( 0)  - x(l) 

X(0)  = m1  X ( 1 ) = m2 

2 adds 

3 Point  DFT  Algorithm 

a1=x(l)  + x(2)  ml=Ial  c1=x(0)-m1 

a2  = x(l)-x(2)  m^  = 0.86603  a^ 

a3=  x(0)  + a1 

X(0)  = a3  X(l)  = c1-jm2  X(2)*c1+jm2 

1 multiply,  6 adds,  1 shift 

4 Point  DFT  Algorithm 
m^  = x(0)  + x ( 2 ) 

m2  = x(0)  - x(2) 
m3  = x(l ) + x(3) 
m4  = X(1 ) - x ( 3 ) 

X(0)=m1+m3  X(l)=m2-jm4  X(2)>m1-m3  X(3)=m2  + m4 

8 adds 
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5 Point  DFT  Algorithm 


a-|  ~ x(  1 ) + x(4) 

m1  = 0.95106  a5 

c1  = x ( 0 ) -m5 

a2  “ X(1 ) " x(4) 

m2  = 1.53884  a2 

c2  = c-j  + m4 

a3  “ x(2)  + x(3) 

m3  = 0.36327  a4 

c3  = C1  - m4 

a4  = x(2)  -x(3) 

m4  = 0.55902  ag 

m5  ~ 4 a7 

Wm3 

a5  = a2  + a4 

c5  = m2  " ml 

a6  - ai  - a3 

a7  = a]  + a3 
ag  = x(0)  +a7 

X(0)=a8 

X(  1 ) = cp  - jc4  X(2)  = 

c3-jc5  X(3)=c, 

X ( 4 ) = c2  + j c4 

4 multiplies,  17  adds 

, 2 shifts 

7 Point  DFT  Alqorithm 

a!  = x(l)  +x(6) 

m1  = 0. 16667  a? 

- x(0)  - 

cp  = + mp  + m3 

a2  = *U)  - x ( 6 ) 

= 0.79016  ag 

a3  = x(2)  +x(5) 

m3  = 0.05585  a? 

C3  = C1  " m2  ‘ m4 

a4  = x(2J  - x(5) 

m4  = 0.73430  a]0 

c4  = cl  -m3  + m4 

a5  = *(3)  + x(4) 

m5  = 0.44096  a^ 

C5  = m5  + m6  " fr,7 

a6  = x(3)  -x(4) 

mg  = 0.34087  a]2 

C6  = m5  ' m6  ’ m8 

a7  = al  + a3  + a5 

m7  = 0.53397  a]3 

C7  = "m5  ' m7  ’ m£ 

a8  = al  'a5 

m8  = 0.87484  a14 

a9  = ~a3  + a5 
a10  = 'al  + a3 

311  =a2  + a4'a6 
a12  = a2  + a6 

a13  = 'Va6 
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a14  = "a2  + a4 
a15  = x(0)+a7 

X(0,=a15  XO)  = c2-jc5  X(2)  =c3- jc6  X(3)  = c„  - jc7 

X(4)=c4  + jc7  X(5)  = Cj  + jCg  X(6)=c2  + jc5 

8 multiplies,  36  adds 


8 Point  DFT  Algorithm 

a]=x(0)+x(4)  m1  = x(0)-x(4) 

a2  = x(2)+x(6)  nu  = x(2)-x(6) 


a3  = x(l)  +x(5) 

m3  **  d-j  ^2 

a4  = x(31  + x(7) 

m4  = a1  “ a2 

a5  = x ( 1 ) - x ( 5) 

m5  = a3  + a4 

a6  = x(3)  -x(7) 

m6  " a3  “ a4 

m^  = 0.7071 (a 

mg  = 0.7071 (a, 

X(0)  = m3  + mg 

XU)  = c]  - jc3 

X(4)  = m3  - m5 

X(5)  =c2  - jc4 

2 multiplies  and  26  adds 


Wm8 

C4  = m2  - m8 

'V 

+v 

X(2)  = m4-jm6  X(3)  = c2  + jc4 
X(6)  = m4  + jm6  X(7)  = Cl+jc3 


9 Point  DFT  Algorithm 
a!  =x(l)+x(8) 
a2  = x ( 2 ) + x(7) 
a3  = x(3)  +x(6) 
a4  3 x(4)  + x(5) 
a5  = x(l ) - x(8) 


ml  = ag  + ai 

m2  = xO  - \ a3 
m3  = 0.86602500  alQ 

V?a9  + all 
m,  = 0,86602500  a. 


C0  = rn2’ml0'mll 

c-|  = m2  ” m9  + ml  1 

c2  = m2  + ml0  + m9 
c3  = m^  + + mg 

V-VV's 
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a6  = x(2)  - x(7) 

mg  = 0.3420200  (a5  + ag) 

C5  = m5  + m6  ' 

a?  = x(3)  -x(6) 

m7  = 0.9848080  (ag  + ag) 

a8  = x(4)  -x(5) 

mg  = 0.6427880  (ag  - ag) 

ag  = a1  +a2  + a4 

mg  = 0.9396930  (a^a^ 

a10  = a5  "a6  + a7 

m^  = 0.1736480  (a4  - a2) 

al  1 ~ a3 

= 0.7660440  (a^a^ 

X(0)  =m1 

X(l)  = c0-jc3  X(2)  = c-j  - 

■jc4  X(3)  = m. 

X(4)  = c2  - jc5 

X(5)  = c2  + jc5  X(6)=m4+jm3  X(7)=c. 

X(8)=c0  + jc3 

8 multiplies,  2 shifts 

and  42  adds 

11  Point  DFT  Algorithm 

a]  = x(l)  + x(10) 

mQ  = x(0)  +a]3 

co  mo  “ ml 

a2  = x(2)  +x(9) 

m1  = 1 .10  (a13) 

Crm3+m4 

a3  = x ( 3 ) +x(8) 

m2  = 0.33166250 

VVm5 

a4  = x ( 4 ) + x(7) 

(a14  "a15  ' a9J 
m3  = 0.51541500  (a2  - a4) 

C3  = m3  ‘ m5 

ag  = x(5)  + x(6) 

m4  = 0.941253500  (a-,  - a4) 

Wm7 

a6  = x(l)  -x(10) 

m5  = 1.41435370  (a2  -a^ 

C5  = Vm8 

a?  = x(2)  - x(9) 

mg=  0.85949300  (a5  - a4) 

C6  = Vm8 

ag  = x(3)  - X\8) 

m?  = 0.04231480  (a3  - a4 ) 

C7  = m10  + mll 

ag  = x(4)  - x(7) 

mg  = 0.3863928C  (a&  - a3) 

c8 r m9 + mn 

a10  " x(5)  “ *(6) 

mg  = 0.51254590  (a2  -ag) 

C9  = ml?+m14 

311  = a1  +a2 

m10  = 1.07027569  (a]  - a3) 

C10  = m12‘m14 

a!2  = a3  + a5 

mn  = 0.55486070  (a12-an 

) c]1=m16  + m17 

a13  = a4  + all  +a12 

m12  = 1.24129440  (a?  + ag) 

C12  = m15'm17 
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a14  = a7'a8 

m13  - 0.20897830  (ag  - ag) 

C13  = m19  + m20 

a15  = &6  + a10 

m]4  = 0.37415717  (ag  + a7) 

C14  = m18“m20 

m15  = 0.04992992  (a9-a10) 

C15  = C5  + C7  + C0 

m]6  = 0.65815896  (ag-a9) 

C16  = C0_C2‘C7 

= 0.63306543  (ag-a^g) 

C17  = C0  + C6  + C8 

m18=  1.08224607  (a?  + a1Q) 

C18  ” C0  ~ C3  " C8 

tH-j g - 0 . 81 720738  (ag“ag) 

C19  = C0  + C1  "c4 

x(0)  =mQ 

m20  = 0.42408709  (a14  + a15) 

C20  = VC11  +C13 
C21  = C13‘  C9"m2 
C22  = m2  + Cl4  + Cl2 
C23  = C12“ClO'm2 

C24  = C20  " C21  + C22 

x0)  = ci9+jc24 

x(2)  =c15  + X(3)  =clg  + jc2i 

X(4)  = c]7  - jc22 

X(5)  = c18  + jc23 

x(6)  =ci8- J'c23  X(7)  = c17  + jc22 

X ( 8 ) = c]6  - jc2i 

X(9)  = C15  ~ JC20 

X(10)  = c]g-  jc24 

20  multiplies  and  83 

adds 

13  Point  OFT  Algorithm 

a!  = x(l ) + (12) 

mQ  = x(0)  + a15 

co = mo ' nl 

a2  = x ( 2 ) + x(ll ) 

m-|  * 1.08333333  a15 

Cpm^iDg-  m2 

a3  = x( 3)  + x(10) 

m2  ~ 0. 30046261  (a-j^a-j^) 

c2  = m7  + mg  + m2 

a4  = x(4)  + x(9) 

m3  = 0. 74927933  a16 

C3  = m8  " m6  ' m2 

a5=x(5)  + x(8) 

m4  = 0.40113213  a]7 

C4  = C0"m9  + ml0 

a6  = x ( 6 ) + x(  7 ) 

mg = 0. 57514073  (a-j g + a-j  ^ ) 

c5  = co'mio"mn 
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a7  = x(l)  - x ( 1 2 ) 

= 0.52422664  a22 

c6  = c0  " m9  + mn 

a8  = x(2)  - x(ll ) 

m 7 = 0.51652078  a23 

c7  = m12  "m14 

a9  = x(3)  -x(10) 

m8  = 0.00770586  (a-2  + a23^ 

c8  = m13  "m14 

a10  ~ ~ x ( 9 ) 

nig  - 0.42763400  a24 

C9  **  ml  5 ~ mi  7 

a]  ] ~ x ( 5 ) - x(8) 

m^r  = 0.15180600  a23 

c10  = m16  *m17 

a12  = “ x ( 7 ) 

m-ji  * 0.57944000  (^£4  ” a25^ 

C11  B m18  ” m20 

a]3  = a2  + a5  + ag 

m-i  r,  “ 1 . 1 5439500  a0/. 

1 c do 

C12  = m19  + m20 

a14  = al  +a3  + a4 

= 0.9065220  a27 

c13  = m3“m5 

a1 5 = a13  + a14 

= 0.81857030  (a2,  + a2/) 

c14  a m4  ” m5 

a16  = a8  + a11  ! a12 

m15  = 1.19713680  a2n 

C15“C1+C4 

al 7 = a7  + a9  " alO 

m16  » 0.86131170  a2Q 

C16“C2  + C5 

al8  = a2  “ a6 

m^7  = 1 .10915485  (a28  + a29^ 

C17  * c5  ~ c2 

al  9 = a3  " a4 

ni-j8  = 0.04274140  a^p 

C18“  C3  + C6 

a20  = al  " a4 

mlg  = 0.04524049  a3] 

C19  * c4  ” C1 

a21  = a5  " a6 

m20  -•  0.29058500  (alc  +a31 ) 

C20  “ c6  ’ c3 

a22  = a18'a19 

c2)  c c4  + J(c7  - Cg) 

a23  = a20  “ a21 

C22  = C12  ” C13  “ c10 

a24  = a18  + a19 

C23  = C7  + C11  +C14 

a25  = a20  + a21 

C24“C9  + C11  " c14 

a26  = a8"a12 

c25  * c8  " c12  ~ c13 

a27  = a7"a9 

C26*C3+J(C8-C10: 

a28  = a8"all 

a29*a7  + a10 

a30  = a11  "a12 

a31  " "a9  ‘ a10 
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X ( 0 ) = mQ 


x(l)  - c15  + c21 

X(2)  - C-jg  + j c22 

X(5)  = c19  + Jc25 

x(6)  = c20  - c2g 

X(9)  - c-jg  + 

x(10)  = c]7  + jc23 

20  multiplies  and  94 

adds 

X ( 3 ) - c-jy  + JC23  x(4)  = c18  " JC24 
X(7)-C2q  + c26  X(8)  = cig  “Jc25 

X(ll)  - cig  - j c22  X02J  = - C21 


20 


CHAPTER  IV 

FIXED  POINT  PFA  AND  WFTA  ERROR  ANALYSIS 

The  DFT  algorithms  are  often  implemented  by  special  purpose  digital  hard- 
ware using  fixed  point  arithmetic.  Accuracy  requirement  is  one  of  the  impor- 
tant factors  which  influences  the  decision  about  the  word  size  of  such  implemen- 
tations. Therefore,  it  is  desirable  to  estimate  the  roundoff  noise  generated 
in  the  DFT  computation.  The  effr-ct  of  fixed  point  arithmetic  on  the  roundoff 
noise  in  FFT  computations  has  been  studied  in  [11]  and  [12].  An  estimate  of 
the  roundoff  noise  in  the  case  of  the  PFA  and  the  WFTA  is  obtained  here  using 
a statistical  model. 

Addition  and  multiplication  by  constants  are  the  only  two  arithmetic 
operations  needed  to  implement  the  DFT  algorithms.  If  the  input  data  is 
properly  scaled  to  avoid  overflow  during  additions,  no  error  will  be  introduced 
in  the  DFT  output  due  to  addition  operations.  However,  when  two  fixed  point 
numbers  are  multiplied,  the  result  has  to  be  rounded.  This  introduces  roundoff 
error  in  the  DFT  output.  To  model  the  effect  of  rounding,  an  additive  noise 
source  is  associated  with  each  real  multiplication.  The  model  for  fixed-point 
multiplication  is  shown  in  Fig.  3.  Each  roundoff  noise  (error)  sample  e is 


Fig.  3 A model  for  fixed-point  multiplication  operation. 


modelled  as  a random  variable  with  probability  density  function  as  shown  in 


T 
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Fig.  4.  Furthermore , it  is  assumed  that  the  error  introduced  by  each  multi- 


plication operation  is  statistically  independent  of  all  other  errors  and  of 
the  input. 

Fig.  5 shows  a roundoff  noise  model  for  an  N-point  short  length  DFT  algo- 
rithm. The  error  vector  E is  defined  as 


E 


Fig.  5 Roundoff  noise  model  for  an  N-point  short  length  DFT  algorithm. 

E = (e1  e2  ...  eM)  (4-1) 

where  e^(i  = 1,2,...,M)  represents  roundoff  error  due  to  multiplication  by  the 
constant  C(i,i).  It  should  be  noted  that  if  jC(i,i)|  = 1,  then  for  all  inputs 
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e.  = 0 . Furthermore,  if  the  input  data  is  complex,  the  variance  of  non-zero 

2 

components  of  E is  2ag  . 

1.  WFTA 

In  the  WFTA,  the  N-point  DFT  relation  is  expressed  as 

X*  = S(CTx'  + E)  (4-2) 

I I 

where  S,  C,  T,  x , and  X are  as  defined  in  Eq.  (2-18),  and  E is  an  (MX1) 
error  vector  (See  Fig.  5).  Therefore 

error  in  the  DFT  output  = SE  (d*3) 


clearly 


E [SE]  = 0 


(4-4) 


where  E [ > ] is  the  statistical  expectation,  and  total  mean  square  error  in 
the  DFT  output  (TMSE) 

= E E iso.jii2  • (4.6) 

>=i  j=i 

|c(j,j)|  t i 


Let 


N M 


EE  !s(i,j)|2 
1=1  j=l 


(4-6) 


Table  1 lists  the  values  of  P for  several  short  length  transforms. 
Furthermore,  let  li'  = ^ x r^  x ...  x r^,  and  r^ , -v.,  ...»  r^  be  relat/vely 


prime.  Since 
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S = S, 


* c * 
*2 


(4-7) 


it  follows  that 


N M 

EE  |S(i,j)|2  = N P1  P2  ...  PL  (4.8) 

i=l  j=l 


where  P.  is  as  defined  in  Eq.  (4.6)  for  N = r. . For  all  values  of  N,  it 
can  be  verified  that 

C(1 ,1 ) = 1 (4-9) 

<*nd  S(i  ,1 ) = 1,  i * 1,2,.  ..,N  (4-10) 


Using  Eqs.  (4  8)  - (4  10),  it  can  be  shown  that 
TMSE  < 2 r o‘  - 1 ) 


(4-11) 


or,  equivalently. 


TMSE  < 2 K,  N2  o2 
— I e 


(4-12) 


where 


K1  = 


P1P2 


- 1 


(4-13) 


It  is  interesting  to  note  that  the  roundoff  noise  does  not  depend  on  the 
order  in  which  short  length  transforms  are  combined  to  obtain  longer  length 


transforms. 
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2.  PFA 

To  include  the  effect  of  rounding  in  the  DFT  computation  by  PFA,  Eqs .(2*12) 
and  (2*13)  have  to  be  modified,  and  these  modified  equations  can  be  expressed 


in  the  matrix  form  as 


x(0,0) 

x(l ,0) 


x(0,l ) 
x(l,l) 


x(0,r2  - 1) 
x(l,r2-  1) 


x ( r -j  - 1 ,0)  x(r1  -1,1) 


x(r1  - 1,  r2  - 1) 


= S1  C1  T1  A*  + S,  • [E^l)  E,(2)  ...  E] (r2)] 


(4-14) 


x(0,0) 

x ( 1 ,0) 


A = $2C2T2 


x(0,l) 

x(l.l) 


x(0,r^-l ) 
x( i ,r,-l ) 


x(r2-l,0)  x(r2-lj) 


x ( ^ ) 


Eo(D 


E2(2) 


E2(rV 


(4.15) 


respectively,  where  A1  = transpose  of  matrix  A,  and  E ^ ( i ) ( i = l ,2, . . . ,r2) 
and  F2  (i)  (i  = l ,2, . . . ,r^ ) are  error  vectors  resulting  from  r2  r^-point 
OFT  computations,  and  r2  r^-point  OFT  computations  respectively.  It  can  be 
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shown  that 


Expected  error  at  the  output  = 0 


(4.16) 


Since  the  error  vectors  are  uncorrelated. 


ri  Mi  , 

£ £ 

i=l  j=l 


\ if  if  |S2H.J)I2  | •2®e  (4J7) 

i=l  j=l  J 

|C2(j  1 


= 2Noe2  + r}  q2)  (4.18) 

where 

ri 

qi  = FT  Z E iSi^‘kH2.  i = 1.2  (4.19) 

1 j=l  k=l 

|C1(k.k)|j*l 


Table  1 shows  the  values  of  q for  several  short  length  DFT's. 

If  N has  r^,  r^,  ....  rL  as  L mutually  prime  factors,  Eq.  (4.18) 
can  be  generalized  as 


2Noe  (q]  + r1  q2  + .. 


rl  r2 


L-l 


qL} 


(4.20) 


TMSE 
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2 2 
= 2K,  IT  o* 
2 e 


(4.21) 


where 


\ + qL-1 
rL  rL  rL-l 


+ 


rL  rL-l 


(4.22) 


It  is  clear  from  Eq.  (4.20)  that  the  roundoff  noise  depends  on  the 
order  in  which  the  short  length  DFT's  are  performed.  In  Eq.  (4.18)  the 
TMSE  is  minimized  if  r,  and  r^  are  selected  such  that 


r2  ql 


+ q2  < rl  q2  + q. 


(4.23) 


or 


H1  < h2 
Vt  - VT 


(4.24) 


In  general,  the  TMSE  in  Ea.  (4.20)  will  be  minimum,  if 


where 


= r.-i  * i = 1 .2,  . . . , L 


(4.25) 


(4.26) 


For  short  length  transforms,  the  value  of  V is  listed  in  Table  1.  The 
factors  of  N should  be  ordered  according  to  the  size  of  V to  minimize 


the  roundoff  error. 
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Similar  results  are  obtained  in  [12]  for  the  FFT  case  and  are  given  below: 


Expected  error  in  the  output  - 0 


(4.27) 


and 


TMSE  = 2 ae  (V  “ N + f) 


(4.28) 


where  N is  the  transform  size.  For  large  values  of  N , 


TMSE  = 2 a2  k3  N2  (where  k3  = 1) 


(4.29) 


Table  2 lists  the  values  of  k^  and  k£  , defined  in  Eqs.  (4.13)  and  (4.22) 
respectively,  for  several  long  length  transforms.  By  referring  to  Table  2 
and  Eq.  (4.29),  it  can  be  concluded  that  the  fixed  point  roundoff  noise  in  all 
the  three  algorithms  will  be  of  the  same  order  of  magnitude. 


N 

kl 

k2 

120 

0.22 

0.21 

240 

0.22 

0.15 

1,008 

0.22 

0.15 

4,095 

0.52 

0.76 

8,190 

0.26 

0.38 

16,380 

0.19 

0.19 

32,760 

0.18 

0.22 

65,520 

0.17 

0.18 

Table  2.  Table  of  k-  and  k3  for 


several  values  of  N. 
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CHAPTER  V 

COMPARISON  OF  DFT  ALGORITHMS 


Let  N = r-| ,r„ rL,  where  are  mutually  prime. 

A..  = number  of  additions  required  to  compute  r. -point  DFT 
M.j  = number  of  multiplications  required  to  compute  r^ -point  DFT 


Table  3 lists  the  operation  counts  for  several  short  length  DFT's. 

As  discussed  earlier,  the  number  of  additions  required  to  compute  the 
DFT  using  the  WFTA  depends  on  the  order  in  which  short  length  transforms  are 
combined.  In  the  following  discussion,  without  loss  of  generality,  it  is 
assumed  that  r^  ,r,,, . . . ,r(  (with  r^  as  innermost  factor)  is  an  ordering  which 
minimizes  the  number  of  additions.  Throughout  the  discussion,  radix-2  FFT 
algorithm  is  implied  whenever  reference  is  made  to  the  FFT  algorithm. 

1.  Number  of  arithmetic  operations  (for  complex  data) 

(i)  FFT  [9]:  Let  N = 2m,  for  some  positive  integer  m,  and  N»  2 

No.  of  real  additions  = 3Nlog!!  - 3N 

2 (5.1) 

N 

No.  of  real  multiplications  = 2 N 1 og2  - 6N 


(ii)  WFTA 


No.  of  real  additions 


J i‘  — i 


j = i+l 


V 

V 


L 


i=l 


No.  of  real  multiplications  = 


30 


31 


(iii ) PFA 

No.  of  real  additions  = 2N 


No.  of  real  multiplications  = 


(5.3) 


Table  4 lists  the  number  of  multiplications  and  additions  required  to  compute 
several  longer  length  transforms  using  these  three  algorithms.  It  is  inter- 
esting to  note  that  some  of  the  transform  lengths  listed  fo^  the  PFA  and  WFTA 
are  close  to  powers  of  2.  It  can  be  seen  that  the  PFA  requires  fewer  number 
of  arithmetic  operations  than  the  other  two  algorithms,  for  wide  range  of 
values  of  N. 

2.  Memory  Requirement 

The  memory  required  for  implementing  the  DFT  algorithms  can  be  broadly 
classified  into  the  following  three  categories: 

a.  Data  memory 

b.  Coefficient  memory 

c.  Program  memory 

The  FFT  and  the  PFA  are  "in-place"  algorithms;  that  is,  new  results 
after  each  stage  of  computation  can  be  restored  over  the  data  used  to  compute 
the  results.  On  the  other  hand,  the  WFTA  is  not  an  "in-place"  algorithm  and 
requires  more  data  space  compared  to  the  other  two  algorithms.  In  fact,  the 
memory  size  required  is  approximately  equal  to  the  number  of  multiplications 
to  be  performed  in  the  computation  of  the  DFT  (see  Table  4). 

Table  4 also  lists  the  coefficient  memory  required  for  computing 
several  longer  length  transforms.  It  can  be  seen  that  the  number  of  coefficients 
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to  be  stored  is  significantly  less  for  the  PFA  compared  to  the  other  two 
algorithms.  The  sine-cosine  values  required  in  the  FFT  method  can  be  computed 
recursively  as  needed;  thus  savings  in  memory  space  can  be  achieved.  The 
disadvantage  of  such  a scheme  is  that  the  computation  time  is  increased  by 
about  15%.  Similar  savings  in  the  memory  requirements  can  be  achieved  in  the 
case  of  the  WFTA.  However,  such  implementations  could  be  very  inefficient  in 
terms  of  speed. 

Of  the  three  DFT  algorithms  being  considered,  the  FFT  program  requires 
minimum  space.  Besides  this,  input  and  output  re-ordering  is  very  systematic 
for  the  FFT,  where  as  they  are  less  so  and  may  require  storage  in  the  case  of 
the  other  two  algorithms.  The  computation  of  the  re-ordering  vectors  as  they 
are  needed  saves  storage,  but  is  less  efficient.  However,  by  using  a different 
input-output  re-ordering  scheme  and  adding  a small  amount  of  extra  hardware, 
in  the  case  of  special  purpose  hardware  implementations,  this  storage  space 
can  be  saved.  This  is  discussed  further  in  Chapter  VI. 

3.  Programming  Complexity 

Programming  of  the  FFT  is  iruch  simpler  compared  to  the  other  two  algorithms. 
This  is  mainly  because  of  the  complicated  indexing  scheme  to  be  used  in  the 
PFA  and  the  WFTA.  To  illustrate  this,  a FORTRAN  program  for  120-point  PFA  is 
listed  in  the  Appendix.  This  can  be  compared  with  the  FFT  programs  given  in 
[1].  It  should  be  noted  that  the  WFTA  is  not  an  inplace  algorithm.  This 
further  complicates  the  programming  of  WFTA. 

4.  Effect  of  finite  word-length  arithmetic 

The  use  of  finite  precision  arithmetic  in  the  DFT  computation  introduces 
error  in  the  output.  The  effects  of  finite  register  length  in  FFT  calculations 
is  discussed  in  [1,  5,  11,  12].  Because  of  the  complicated  structure  of  the 


34 


PFA  and  the  WFTA,  it  is  difficult  to  analyze  the  effects  in  these  algorithms. 
The  PFA  and  WFTA  require  fewer  arithmetic  operations  compared  to  the  FFT. 

It  is  very  likely  that  the  floating-point  DFT  computation  by  these  methods  will 
introduce  smaller  error  than  in  the  case  of  the  FFT.  By  computing  the  coeffi- 
cients needed  using  higher  precision  arithmetic,  the  effects  of  coefficient 
quantization  can  be  reduced  in  all  the  methods.  The  effects  of  fixed  point 
arithmetic  in  DFT  computation  was  discussed  in  Chapter  IV. 
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CHAPTER  VI 

A HARDWARE  IMPLEMENTATION  OF  PRIME  FACTOR  ALGORITHM 

It  is  often  necessary  to  build  special  purpose  hardware  for  computing 
the  discrete  Fourier  transform.  With  the  availability  of  low  cost  micropro- 
cessors, it  is  now  economical  to  conceive  of  processor-based  hardware  struc- 
ture. The  WFTA  is  not  suitable  for  this  purpose  if  transforms  of  long 
sequences  are  required.  In  such  cases,  a choice  has  to  be  made  between  the 
FFT  and  the  PFA.  Multiplication  is  one  of  the  slower  arithmetic  operations 
in  processor-based  systems.  The  ratio  of  multiply  to  add  times  could  be  as 
large  as  10  to  15.  Therefore,  from  the  earlier  discussion  it  is  evident 
that  the  PFA  is  better  suited  for  this  purpose  than  the  FFT.  Furthermore, 
there  are  certain  other  advantages  in  using  the  PFA  for  hardware  implementa- 
tion, and  this  will  be  made  clear  soon. 

A simple  block  diagram  of  PFA  hardware  is  as  shown  in  Fig.  6.  The  dia- 
gram is  self-exDlanatory.  The  coefficients  are  stored  in  the  read  only 
memory  (ROM).  The  initial  and  final  reordering  vectors  are  also  stored  in 
the  ROM.  The  DFT  algorithm  is  implemenced  at  the  microprogram  level  to 
increase  the  speed  of  the  system.  The  input-output  section  is  not  shown 
in  the  diagram. 

This  system  can  be  speeded  up  further  by  adding  a few  inexpensive  hardware 
blocks  to  it.  By  providing  a small  number  of  high  speed  storage  registers 
(a  maximum  of  64  words  is  sufficient  for  N as  large  as  720,720)  it  is  pos- 
sible to  reduce  the  number  of  accesses  to  the  data  memory.  The  intermediate 
results  during  computation  of  short  length  DFT's  can  be  stored  in  this  fast 
memory.  If  N has  L factors,  then  each  data  point  is  accessed  (for 
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Figure  6.;  A Block  Diagram  Of  PFA  Hardware 


reading  and  storing  the  result  after  each  stage  of  computation)  2L  times 
and  therefore,  the  data  memory  is  accessed  only  2NI.  times.  It  should  be 
noted  that  the  use  of  a fast  memory  like  this  will  also  reduce  the  number  of 
data  memory  accesses  ,-n  systems  implementing  the  FFT  [5]  and  the  WFTA.  By 
using  2\! N (or  2/2N  if  / N is  not  an  integer)  fast  memory  locations,  the 
number  of  data  memory  accesses  in  the  FFT  can  be  reduced  from  2 log^  to 
2 + log^  [or  1 + log^  if  SQRT(N)  is  not  an  integer].  In  the  case  of  WFTA, 
the  number  of  data  memory  accesses  required  is  given  by  the  following  expression 
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2 + 


4EE(w* 


i=2  j=i 


where  N = N-j  ,No, . . . ,NL  and  (6.1) 

= No.  of  multiplies  required  for 
N-point  DFT. 


Table  5 shows  a comparison  of  this.  It  is  interesting  to  note  that,  for  a 
given  transform  size,  the  PFA  requires  the  least  numter  of  memory  accesses. 


N 

FFT 

PFA 

WFTA 

4K 

14 

8 

19 

8K 

14 

10 

23 

16K 

16 

10 

23 

32K 

16 

10 

23 

64  K 

18 

10 

26 

Table  5.  No.  of  data  memory  access  per  point  in  the  three  algorithms. 

Sometimes  the  number  of  slow  memory  accesses  can  be  further  reduced  by 
using  the  WFTA  to  combine  several  shorter  DFT  algorithms.  For  example,  a 
120-point  DFT  can  be  implemented  with  8 and  15  as  factors  of  120.  The  WFTA 
can  be  used  to  obtain  tne  15-point  DFT  algorithm  from  3 and  5-point  DFT 
algorithms.  By  doing  so,  the  number  of  arithmetic  operations  are  not 
increased,  but  the  number  of  data  memory  accesses  is  reduced  by  about  30  per 
cent  (compared  to  the  120-point  DFT  implementation  with  3,  5,  and  8 as  factors). 
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The  coefficients  in  the  PFA  are  either  purely  real  or  purely  imaginary. 

This  fact  can  be  used  to  speedup  the  system  further  by  using  an  extra  arith- 
metic and  logic  unit  (ALU).  The  width  of  microinstruction  will  have  to  be 
increased  by  a few  bits  to  generate  the  additional  control  signals  needed. 

A modified  block  diagram  is  shown  in  Fig.  7.  The  real  and  imaginary  parts 
of  the  data  are  processed  separately.  Whenever  the  coefficient  to  be  multi- 
plied is  real,  there  will  not  be  any  interaction  between  the  two  ALU's,  but 
if  the  coefficient  is  imaginary,  the  results  after  the  multiplication  are 
exchanged  between  the  two  ALU's. 

Two  I/O  registers  I0REG1  and  I0REG2  are  used  for  this  purpose.  The  ALU1 
and  ALU2  can  load  registers  I0REG1  and  I0REG2,  and  read  from  registers  I0REG2 
and  I0REG1  respectively.  The  system  shown  in  Fig.  7 can  be  thought  of  as  two 
identical  processors  working  in  parallel  and  controlled  by  a single  controller 
(CCU).  The  addresses  of  I0REG1  and  I0REG2  for  SYS1  are  identical  to  the  addres- 
ses of  registers  I0REG2  and  I0REG1  respectively,  for  SYS2.  The  other  blocks 

in  Pig.  7 are  self-explanatory. 

Let  X.  be  the  number  of  multiplications  by  imaginary  coefficients  (in- 
cluding coefficients  + j 1 Jin  an  l^-point  OFT  computation.  Then  the  number  of 
a exchanges  between  the  two  ALU's  is  given  by  the  expression 

^Sx/N.)  (6.2) 

i=l 

The  values  of  X's  for  different  short  length  DFT's  is  shown  in  Table  6. 
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As  mentioned  earlier,  2N  ROM  locations  are  needed  to  store  input  and  output 
reordering  vectors  in  the  PFA.  However,  by  using  a different  scheme  to  con- 
vert the  sequences  Xn  and  x^  , in  Eq.  (1.1),  into  multidimensional  arrays, 

N memory  locations  can  be  saved.  To  explain  further  let  N = r^r^  and 
GCD(r-j,r2)  = 1.  The  indices  n and  k in  Eq.  (1.1)  can  be  expressed  as: 

n s n]r2si  + n2rls2  ^mod  n = (6.3) 

k = kjr2sl  + ^2rlS2  ^mod  ^ k = 0,1,...,N-1  (6.4) 

where  n^ , n^,  k-j,  k^,  s^  and  s^  are  solutions  of 

n.|  = n (mod  r^ ) 

n?  r n (mod  r?) 

k|  = k (mod  r^ ) 

k2  = k (mod  r2)  (6.5) 

r^  = 1 (mod  r^ ) 

and  r1s2  = 1 (mod  r 2) 

respectively.  By  the  Chinese  remainder  theorem, 

nk  2 n^s^  + n2k2S2rl  (6.6) 

Representing  the  sequences  x.  and  X in  Eq.  (1 . 1 ) as  two  dimensional  arrays 

»\  n 

and  using  Eqs.  (6.4)  - (6.6),  the  OFT  relation  in  Eq.  (1.1)  can  be  rewritten  as 
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r„-l 

r-,-1 
s2k2n?  « " 

L 

« 

II 

s,  k..  n, 

Wr  ' 1 x(kr  k2) 

(6.7) 

k2=0 

2 kfO 

r2-l 

ve 

k,n. 

WrJ  1 x(kr  k2) 

(6.8) 

k2=0 

2 k^O 

where  and  P 2 are  permutation  matrices  and  the  elements  of  P^  ( or  P^) 
depend  only  on  the  numbers  s^  and  r2  (or  s^  and  r^).  A simple  modification 
of  the  short  length  DFT's  can  take  care  of  these  permutation  matrices;  thus  a 
saving  of  N memory  locations  can  be  achieved.  These  mapping  vectors  can  also 
be  computed  as  and  when  needed,  without  affecting  the  system  speed,  by  using 
a few  extra  hardware  blocks  (such  as,  counters,  adders)  P 3Q . This  scheme  is 
useful  only  when  N is  large. 


N 

X 

2 

0 

3 

1 

4 

1 

5 

3 

7 

4 

8 

3 

9 

5 

11 

10 

13 

13 

16 

8 

Table  6.  No.  of  imaginary  coefficients  in  short  length  DFT  algorithms. 
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CONCLUSION 

Efficient  algorithms  exist  for  computing  the  DFT  of  long  sequences,  when 
the  sequence  length  is  a composite  number.  The  FFT,  the  PFA  and  the  WFTA  are 
three  such  algorithms.  In  this  report,  various  aspects  of  these  algorithms 
were  discussed.  Efficient  algorithms  for  11  and  13-point  DFT's  were  pre- 
sented. Using  these  and  the  other  short  length  transforms,  the  DFT  of  very 
long  sequences  can  be  obtained  by  the  PFA  and  the  WFTA,  in  fewer  number  of 
multiplications  than  in  the  FFT. 

In  the  PFA,  the  DFT  of  a long  sequence  is  obtained  by  performing  a number 
of  short  length  DFT's.  This  fact  can  be  used  to  design  high-speed  dedicated 
hardware  for  DFT  computation.  Moreover,  the  PFA  requires  fewer  arithmetic 
operations  (i.e.,  combined  additions  ar.d  multiplications).  Hence,  it  is 
expected  to  introduce  smaller  error  due  to  finite  word  length  arithmetic. 

The  FFT  requires  fewer  additions  than  the  other  two  algorithms,  but  the 
number  of  multiplications  needed  is  considerably  greater.  It  is,  however, 
important  to  note  that  the  FFT  lends  itself  to  more  systematic  programming. 

The  WFTA  requires  the  least  number  of  multiplications  among  the  three 
algorithms.  However,  the  number  of  additions  required  is  slightly  more  than 
the  others  for  transform  sizes  up  to  few  thousands  and  becomes  formidable 
for  very  long  transforms.  Furthermore,  it  requires  more  data  and  program 
memory  than  is  required  for  the  other  two. 
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APPENDIX 


This  Appendix  lists  a FORTRAN  program  for  obtaining  the  DFT  of  120-points, 
by  the  PFA.  By  making  minor  modifications  at  the  places  indicated,  this  pro- 
gram can  be  used  to  implement  all  the  3-factor  PFA's. 
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120-POINT  DFT  ALGORITHM 


$UjATIVBA&V&5& 


N 

Nl.  N2, N3 
K1  * N2*N3 
X 


TRANSFORM  SIZE 

MUTUALLY  RELATIVE  FACTORS  OF  N 
COMPLEX  ARRAY  OF  DIMENSION  N 


INTEGER  Nl.  N2, N3, STADR,  STEP, TSZE 

THE  FOLLOWING  3 STATEMENTS  ARE  TO  BE  MODIFIED  IF  N IS  MODIFIED 
COMPLEX  XU20J 

COMMON  /STORE1/X, STaDR, STEP, TSZE  /ST0RE2/N,  Nl , N2,  N3,  K1 
DATA  N,  N1.N2,  N3,  Kl/120,  3,  5,  8,  40/ 

READ  DATA  FROM  INPUT  FILE  IN  THE  REQUIRED  ORDER 

DO  t I a 1 , N 
IND~i 

K « IROM< IND) 

READ  <21, 2>  X(K) 

FORMAT  OF  INPUT  FILE 

FORMAT <2F) 

STARTING  OF  THE  PFA 

PERFORM  N1*N2  N3-P0INT  DFTS 

STEP  * 1 
TSZE  * N3 

DO  3 STADR  » 1.N-N3+1.N3 
CALL  SHORTR 

PERFORM  Nl  »N3  N2- POINT  DFTS 

STEP  « N'J 
TSZE  - NS 
ICNT  - 0 

DO  4 I - 1,  Nl 
00  9 J * 1 . STEP 
STADR  » ICNTfU 


noon  non  n n n n n n n n co  n n n &-  n nnn^ui 
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PIS  PAGE  IS  SfiST  QUALIIT  PRACTICABLE 
JSKtif  OOPY  FURNISHED  TO  DDC 

CALL  SHORTR 
ICNT  = ICNT+K1 

PERFORM  N2*N3  N 1 -POINT  DFTS 

STEP  - K1 
TSZE  =*  N1 

DO  6 STADR  = l.Kl 
CALL  SHORTR 

REORDER  THE  RESULT  AND  OUTPUT 

DO  7 T 1.N 
IND^=I 

K = IROfcKlNLM 
WRITE  (22.  8)  I , X t k > 

FORMAT ( / J 0 X » , 13.  ) \5x,Fi5.  8,  '.  '..F15.  8) 

STOP 

END 

THE  FUNCTION  ROM  HAS  TO  BE  MODIFIED  IF  THE  NUMBER  OF  FACTORS 
ARE  CHANGED. 

INPUT/OUTPUT  MAPPING  ROM  TABLE  LOOK-UP  SIMULATION 

INTEGER  FUNCTION  IROMd) 

COMMON  /3T0RE2/N, Ri, R2, R3, K1 
INTEGER  PI  R2,  R3.K1 
II  * 1-1 
N1  = MOD ( 1 1 > R 1 ' 

N2  « MOD(II. P2' 

N3  » MODI  II.  P3»-  1 
IRON  * NUKl*-M2*R3fN3 
RETURN 
END 

SHORT  LENGTH  DFT  ALGORITHMS 

SUBROUTINE  SHORTR 
IMPLICIT  COMPLEX  (A--H..J-Z) 

THE  FOLLOWING  STATEMENT  HAS  TO  BE  MODIFIED  IF  N IS  CHANGED 

COMMON  /STQRE1/X. /O, ISTP. ISZE 
DIMENSION  X ( 120) 

THE  FOLLOWING  STATEMENT  IS  TO  BE  CHANGED  IF  OTHER  SHORT 
LENGTH  TRANSFORMS  ARE  ADDED 

GO  TO  (1,  1.3  1,  3.  1.  1,8)  ISZE 
DATA  J/(0  0, 1 0)/ 


nnn  ui  non  nnann  n n r>  non  uoftn 
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3-POINT  DFT 

11  * IO+ISTP 

12  = Il+ISTP 

INPUT  ADDITIONS 


55K5SE' 


A1  * X(ll)+xa2) 
A2  = X < II )-X { 12) 
A3  * X < I0)+A1 

MULTIPLICATIONS 

Ml  = 0 5*A1 
M2  * 0 86603* A2 

OUTPUT  ADDITIONS 


OUTPUT  ORDERING  HAS  TQ  BE  MODIFIED  IF  FACTORS  OF  14  ARE  ("HANGED 

Cl  = XII0)-M1 

X(IO)  35  AO 

JM2  ■ -JTIMES(M2) 

X < 1 1 ) 3 CJ+UM2 
X ( 12 ) - C1-JM2 
RETURN 

5 - POINT  DFT 

11  * IO+ISTF 

12  * Il+ISTP 

13  - I2+ISTP 

14  * I3+I3TP 

INPUT  ADDITIONS 

A1  * X< 1 1 / +X  < I4> 

A2  » X ( I 1 )-X( 14) 

A3  * X ( 12 ) + X < 13) 

A4  - X ( I2)-X l 13) 

AS  » A2+A4 
A6  * A1-A3 
A7  * A1+A3 
A8  « X(I0)+A7 


nnn  moon  doo  ono 


MULTIPLICATIONS 

Ml  * 0. 95106*A5 
M2  * 1 53684-* A2 
M3  * 0,36327#A4 
M4  « 0 5S902*A6 
MS  * 0 25*A7 


7 X”.  "r  ,i  , f*  ' 


KS  Tim  IS  BIST  QUAW.XI  1 
fEXM  COPY  FUHttSffl®  WDB6 


OUTPUT  ADDITIONS 
Cl  * X (TO) -M5 

C2  * CHH4  - 

C3  » C1-N4 
C4  a Ml -M3 
C4  a ,JTIHeS<C4) 

C5  ® M2* M 1 " 

r3  » UTlMFISfCS' 

X(I0>  --  A8 

INDICES  OF  X TO  3E  MODIFIED  IF  FACTORS  OF  N ARE  CHANCED 

X < 1 1 > » Cl-K 4 
X < 12 ) --  CO+OS 
X < 13  > a C3-  C5 
X < 14 ) a C2-C4 

RETURN 

8 - POINT  TRANSFORM 

11  » I0+ISTP 

12  * It+ISTp 

13  * I 2+ IS TP 

14  » T3+1STP 

15  a I4-*  ISTP 

16  * ISilSIF 

17  a Ito+ISTP 

INPUT  ADDITIONS 

A1  • X ( 1 0 / + X 1 1 4 < 

A2  * X(I2»*X(I&) 

a3  « nin+ms) 

A4  » X( II >~X( IS) 

AS  * X<I3)+XtI?> 

A6  - X< I3>~X< 17) 

A7  » A1+A2 
A8  ■ A3+A5 


• -CTO.  - J» 


nnr.  •-  n >-t  n non  o no 


MULTIPLICATIONS 

MO 

- 

A7*  AS 

Ml 

A7-A8 

M2 

V 

A1-A2 

M3 

sr 

xi ro > - x<  i4  > 

M4 

as 

<A4-A6>*0  7 07 1 07 

M5 

* 

A 3- A 3 

M5 

IS 

-0TIMES<M5) 

M6 

sc 

X i 1 2 1 "•  X i 1^// 

M6 

a r 

-JTIMESIM6) 

M7 

s* 

t A4-*-A6 » <0  70/107 

M7 

as 

J’"  lMfc‘3  * M7 1 

OUTPUT  ADDITIONS 

Cl 

M3+M4 

C2 

V 

M3  M4 

C3 

m 

M6+M7 

C4 

w 

M6-M7 

50 


output  0Rr*e.Pti!.i  has  to  nr  modified  if  factors  of  n a*<f  • hanged 

X<10'  * MO 
X < X 1 > -v  Cl  C i 

x<  12'  * mp.-m^ 

X < 13'  - Cc’-rC- 
X<I4>  *•  Ml 
XU"5>  « C2-C4 
X ( 16  ) *-  M2-rM‘j 
X ( 17 ) « Cl-CJ 
RETURN 
RETURN 
END 

MULTIPLICATION  Cir  m COMPLEX  CONSTANT  B »'  01 

COMPLEX  FUNO 7 t ON  JTIKESIA) 

COMPLEX  A 
D * REAL*  A • 

C » -A I MAC ' A ) 


I JTIMEfj  *•  CIX'Lau.  •• 

RETURN 

["  ENO 


